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Abstract. For a finite measure A on [0, 1], the /1-coalescent is a coalescent process such that, whenever there are b 
clusters, each fc-tuple of clusters merges into one at rate J^x k - 2 (1 - xf- k A(dx). It has recently been shown that if 
1 < a < 2, the /L-coalescent in which A is the Beta(2 — a, a) distribution can be used to describe the genealogy of a 
continuous- state branching process (CSBP) with an a-stable branching mechanism. Here we use facts about CSBPs 
to establish new results about the small-time asymptotics of beta coalescents. We prove an a.s. limit theorem for 
the number of blocks at small times, and we establish results about the sizes of the blocks. We also calculate the 
Hausdorff and packing dimensions of a metric space associated with the beta coalescents, and we find the sum of the 
lengths of the branches in the coalescent tree, both of which are determined by the behavior of coalescents at small 
times. We extend most of these results to other yl-coalescents for which A has the same asymptotic behavior near 
zero as the Beta(2 — a, a) distribution. This work complements recent work of Bertoin and Le Gall, who also used 
CSBPs to study small-time properties of /l-coalescents. 

Resume. L'objet de ce travail est l'etude du comportement asymptotique en temps petit des Beta-coalescents. Ces 
processus decrivent la limite d'echelle de la genealogie d'un certain nombre de modeles en genetique des populations. 
Nous donnons en particulier un theoreme de convergence presque sure pour le nombre de blocs renormalise. Nous 
decrivons egalement le comportement asymptotique des tailles des blocs. Ces resultats permettent de calculer la 
dimension de Hausdorff et la dimension de packing d'un espace metrique associe a ce type de coalescents, ainsi que 
la longueur totale des branches de l'arbre de coalescence. Ce dernier resultat correspond a une question qui se pose 
en genetique des populations. Enfin, ces resultats sont en partie etendus par des arguments de couplage aux cas de 
yl-coalescents pour lesquels la mesure A aun comportement pres de semblable a celui d'une distribution Beta. Les 
methodes employees reposent essentiellement sur un lien entre Bet a- coalescent et les processus de branchement a 
espace d'etat continu. 
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1. Introduction 

Coalescent processes are stochastic models of a system of particles that start out separated and then merge 
into clusters as time goes forward. Coalescent processes have applications to areas such as physical chemistry, 
where one can think of the merging of physical particles, astronomy, where we have the merging of galaxies 
into clusters, and biology, where ancestral lines of a sample from a population merge as we go backward in 
time. See [1, 5] for surveys. 

Much work on coalescence has focused on processes in which only two clusters can merge at a time. 
However, Pitman [30] and Sagitov [31] introduced coalescents with multiple collisions, in which many clusters 
can merge at once into a single cluster. To define these processes precisely, let V n be the set of partitions of 
{1, . . . , n}, and let V be the set of partitions of N. For all partitions 7r £ V, let R n ir be the restriction of ir to 
{1, . . ., n}, meaning that R n Tt £ V n , and two integers i and j are in the same block of R n ~K if and only if they 
are in the same block of ir. A coalescent with multiple collisions is a 'P-valued Markov process (II(t),t > 0) 
such that 11(0) is the partition of N into singletons and, for all n G N, the process (R n II(t),t > 0) is a 
■Pn-valucd Markov process with the property that whenever there are b blocks, each transition that involves 
merging k blocks of the partition into one happens at rate A^, and these are the only possible transitions. 
The rates Xb_k do not depend on n nor on the numbers of integers in the b blocks. Pitman showed that the 
transition rates must satisfy 



for some finite measure A on [0, 1], and the coalescent process such that (1) holds for a particular measure A 
is called the /1-coalescent. When A is a unit mass at zero, then each transition involves the merger of exactly 
two blocks, and each such transition occurs at rate 1. This process is known as Kingman's coalescent and 
was introduced in [25]. 

There has been a considerable amount of work concerning applications of these processes. Sagitov [31] 
showed that coalescents with multiple collisions can describe the genealogy of populations in which there 
are occasionally very large families. See [29] for further results in this direction. Durrett and Schwcinsberg 
[18] showed that coalescents with multiple collisions can be used to model the genealogy of a population 
that periodically experiences beneficial mutations. Schweinsberg [33] considered the genealogy of supercritical 
Galton- Watson processes in which the probabability of having k or more offspring decays like Ck~ a for some 
constant C. When 1 < a < 2, the genealogy of this process, as the population size tends to infinity, converges 
to the yl-coalescent in which A is the Beta(2 — a, a) distribution. Birkner et al. [11] established a continuous 
version of these results, showing that the yl-coalescents that describe the genealogy of a continuous-state 
branching process (CSBP) are precisely those in which A is the Beta(2 — a, a) distribution, where < a < 2. 
The a = 1 case had previously been established by Bertoin and Lc Gall [6] . 

These results suggest that the /1-coalescents in which A is the Beta(2 — a, a) distribution form an im- 
portant one-parameter family of coalescents with multiple collisions that is worthy of further study. These 
results also suggest that it should be possible to use results about continuous-state branching processes to 
get new insight into the behavior of coalescent processes. The goal of this paper is to establish some results 
about the asymptotics of the Beta(2 — a, a)-coalescents at small times. Because the small-time behavior of 
yl-coalescents depends only on properties of A near zero, some of our results extend easily to vl-coalescents 
that have the same behavior near zero as the Beta(2 — a, a)-coalescents, and we prove these results in this 
more general form. Note that when a = 1, the Bcta(l, 1) distribution is the uniform distribution on [0, 1], 
and the associated coalescent process, called the Bolthauscn-Sznitman coalescent, has already been studied 
extensively (see, for example, [2, 9, 12, 22, 30]). We focus here on the case in which 1 < a < 2. Some of our 
results are closely related to results of Bertoin and Le Gall [8], who also used CSBPs to study the small-time 
behavior of ^1-coalescents. 

1.1. Number of blocks 

Our first result concerns the number of blocks at small times. We say the /1-coalcscent comes down from 
infinity if the number of blocks is a.s. finite for all t > 0, and stays infinite if the number of blocks is a.s. 




(1) 
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infinite for all t > 0. It is known (see Example 15 of [32]) that the Bcta(2 — a, a) coalescent comes down 
from infinity if 1 < a < 2 and stays infinite if < a < 1. Theorem 1.1 gives a limit theorem for the number of 
blocks at small times when 1 < a < 2. Note that this limit holds almost surely. Independently of the present 
work, Bertoin and LeGall obtained the limit in probability for a larger family of yl-coalescents (see Lemma 
3 of [8]). 

Theorem 1.1. Let A be a finite measure on [0,1] such that A(dx) = f{x)dx, where f(x) ~ Ax l ~ a for 
some a S (1,2) and ~ means that the ratio of the two sides tends to one as x 10. Let {LI{t),t > 0) be the 
A-coalescent, and let N(t) be the number of blocks of the partition LI(t). Then 

f n \l/(a-i) 

limt^-^iVm = — r a.s. (2) 

UO v ; \AT{2 - a) J V ' 

In particular, if A is the Beta(2 — a, a) distribution, then 

limi 1 /( Q - 1 >iVm = (ar(a)) 1/(Q ~ 1) a.s. (3) 
tj.o 



To see how (3) follows from (2), note that for the Beta(2 — a, a)-coalescent, we have 
I (2 — a)L (a) 

so in this case A = l/[r(a)T(2 — a)]. Also, note that as a f 2, the Bcta(2 — a, a) distribution converges to 
the unit mass at zero. Consequently, although Theorem 1.1 is stated for 1 < a < 2, Kingman's coalescent 
can be viewed as corresponding to a — 2. Indeed, it is known for Kingman's coalescent (see Section 4.2 of 
[1]) that tN(t) 2 a.s. as t J, 0, which is what one gets plugging a = 2 into (3). 

In Section 4, Theorem 1.1 is obtained by relating the behavior to continuous-state branching processes. 
In [3] , we present an alternative approach based on continuous stable random trees and the Kesten-Stigum 
theorem. 



1.2. Block sizes 



We now consider the sizes of the blocks of the beta coalescents. It is clear from the definition that if 
(LT(t),t > 0) is a yl-coalescent, then TI(t) is an exchangeable random partition of N for all t > 0. It thus 
follows from results of Kingman [24] that if B C N is a block of the partition LI(t), then the limit 

_^ m 

lim -Vl{ ie B} 
m— >oo rn * — ' 

i=l 

exists almost surely and is called the asymptotic frequency of B. If, for each t > 0, the sum of the asymp- 
totic frequencies of the blocks of LT(t) equals one almost surely, then we say the coalescent has proper 
frequencies. Pitman showed (see Theorem 8 of [30]) that the yl-coalescent has proper frequencies if and 
only if J Q x~ 1 A(dx) = oo. In particular, the Bcta(2 — a, a)-coalescent has proper frequencies if and only if 
a > 1. When < a < 1, for each t > 0, almost surely a positive asymptotic fraction of the integers will be 
in singleton blocks of LI(t), so the sum of the asymptotic frequencies will be less than one. For coalescents 
with proper frequencies, almost surely LJ(t) has no singletons for all t > 0. 

If (LT(t),t > 0) is a yi-coalescent, then one can construct a ranked yl-coalcscent (0(t),t > 0) such that 
0(t) is the sequence of asymptotic frequencies of the blocks of the partition LI(t), ranked in decreasing 
order. For most /1-coalescents, there appears to be no simple description of the distribution of 0(t) for 
fixed t. An exception is the Bolthausen-Sznitman coalescent, in which case &{t) has the Poisson-Dirichlet 
distribution with parameters (e~*,0); see [12, 30], or see [22] for a short proof using recursive trees. Also, for 
Kingman's coalescent, if Tfc = inf{£: N(t) < k} is the first time at which the coalescent has k blocks, then 
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the distribution of 0{Tk) is uniform on the simplex Ak — {{x\, . . . , Xk)' x\ > • • ■ > Xk, x\ + • • ■ + Xk = 1}, as 
shown in [25]. Theorem 1.2 below gives a related result for the Beta(2 — a, a)-coalescents with 1 < a < 2. It 
is possible that these coalescent processes may never have exactly k blocks, so we calculate the conditional 
distribution of 0(Tk) given N(Tk) — k, which is the event that the coalescent has exactly k blocks at some 
time. 

Theorem 1.2. Let (LT(t),t > 0) be the Beta(2 — a, a) -coalescent, where 1 < a < 2, and let (0(t),t > 0) be 
the associated ranked coalescent. Let N(t) be the number of blocks of LI(t) at time t. Fix a positive integer k, 
and let X[,...,X' k be i.i.d. random variables with distribution fi, where the Laplace transform of \x is given 



Let Xi , . . . , Xk be the values of X[ , . . . , X' k ranked in decreasing order. Let Sk = X\ + ■ • • + Xk ■ Lf g: Ak 
[0, oo ) is a nonnegative measurable function, then 



To see how this result is related to the result for Kingman's coalescent, note that if a = 2, the right- 
hand side of (4) becomes 1/(1 + A), so jx is the exponential distribution with mean 1. If X\, . . . ,Xk are 
obtained by ranking k i.i.d. random variables that have the exponential distribution with mean 1 and 

Sk =Xi-\ ^ Xk, then Sk is independent of (Xi/Sk, ■ ■ ■ ,Xk/Sk)- Consequently, the right-hand side of (5) 

becomes E[g(Xi/Sk, ■ ■ ■ ,Xk/Sk)]- Furthermore, the distribution of (Xi/Sk, ■ ■ ■ ,Xk/Sk) is uniform on Ak- 
Note also that 0(Tk) is independent of Tk for Kingman's coalescent because exactly two blocks coalesce 
during each merger, but this property does not hold for other /1-coalescents. 

Remark 1.3. The distribution fi first arose in the work of Slack [35], where it was used to describe the 
family sizes of critical Galton-Watson processes with heavy-tailed offspring distribution, at large times when 
conditioned on survival. More precisely, recall that Yaglom's limit law [23, 36} states that for critical Galton- 
Watson processes with finite variance, the distribution of the number of offspring at time n, conditioned to be 
positive and then rescaled to have mean 1, converges to the exponential distribution with mean 1 as n — > oo. 
When the offspring distribution is in the domain of attraction of a stable law of index a £ (1,2) (and thus 
does not have finite variance), Slack showed that the distribution of the number of offspring in generation 
n, conditioned to be positive and then rescaled to have mean 1, converges to fi as n — > oo, thus proving 
an analog of Yaglom's limit law for offspring distributions with infinite variance. The a-stable CSBP for 
a G (1,2) arises as a limit of Galton-Watson processes whose offspring distribution is in the domain of 
attraction of a stable law (see [16, 26]). Since beta coalescents can be recovered from the genealogy of such 
continuous-state branching processes, it is natural that the same distribution /i arises here as well. Although 
our proof never uses it explicitly, many of our results can be understood intuitively in terms of Slack's 
theorem. 

We now consider the sizes of the blocks at small times. By evaluating the derivative of the right-hand side 
of (4) at zero, we see that E[X[\ = 1 for all i. Therefore, Sk will be approximately fc for large k. At small times, 
the number of blocks will be large, so Theorem 1.2 suggests that when there are k blocks, the distribution of 
the asymptotic frequencies of these blocks will be approximately the distribution of k independent random 
variables with distribution \x, each divided by k. Theorem 1.4 below makes this observation rigorous. The 
motivation for this result comes from the recent work of Bcrtoin and Le Gall, who proved a similar statement 
(see Theorem 4 of [8]). Bertoin and Le Gall's result applies to a larger family of yl-coalescents, as it requires 
only a regular variation condition on A near zero. However, Bertoin and Le Gall prove only convergence in 
probability, whereas we establish almost sure convergence for the beta coalescents. 



by 




(4) 




(5) 
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Theorem 1.4. Let (77(f), t > 0) be the Bcta(2 — a, a) -coalescent, where 1 < a < 2. Let N(t,x) be the number 
of blocks of LI{t) whose asymptotic frequency is at most x. Let F(x) = /u((0,a;]) for all x, where fi is the 
probability distribution defined in (j.). Then 

limsupli 1 /^- 1 ^,^/^- 1 ^) - (ar(a)) 1/(a ~ 1) J F 1 ((ar(a)) 1/(a_1) a;)| =0 a.s. 

*1° x>0 

Note that by taking a limit as x — > oo in Theorem 1.4, we recover the result of Theorem 1.1 for the beta 
coalescents. Also, note that if a = 2 and /i is the exponential distribution with mean 1, then the expression 
(ar(a)) 1 /( a-1 )F((ar(a)) 1// ( Q-1 - ) 2;) becomes 2(1 — c~~ 2x ), and, as observed in [8], we again recover a known 
result for Kingman's coalescent (see Section 4.2 of [1]). 

From Theorem 1.4, we obtain the following result for the size of the block containing the integer 1. 
Note that as a consequence of Kingman's work [24] on exchangeable random partitions, for coalescents 
with proper frequencies the asymptotic frequency of the block containing 1 is a size-biased pick from the 
asymptotic frequencies of all of the blocks. 

Proposition 1.5. Let (71(f), f > 0) be the Bcta(2 — a, a) -coalescent, where 1 < a < 2, and let K(t) be the 
asymptotic frequency of the block o/ 77(f) containing 1. Then 

(ar(a)) 1/(a ~ V 1 ^- 1 ^) 4 X as 1 1 0, 
where E[e~ xx ] = (1 + X^y^/^-i) . 

Although the distribution /x has mean one and infinite variance, we can see by differentiating the Laplace 
transform of X that E[X] = oo. Also, as will be seen from the proof of the proposition, X has the size-biased 
distribution P(X £ dx) = xfi(dx). 

We also have the following result concerning the largest block of the coalescent at small times. While the 
size of a typical block and the block containing 1 are both of order t 1 ^ a ~ 1 \ the size of the largest block is 
of order f 1 /". This result follows from a Tauberian theorem, which gives information about the tail behavior 
of the distribution /i, and extreme value theory. Recall that a random variable X is said to have a Frechet 
distribution of index a if P(X < x) = e~ x for all x > 0. 

Proposition 1.6. Let (LT(t),t > 0) be the Bcta(2 — a , a) -coalescent, where 1 < a < 2, and let W(t) be the 
the largest of the asymptotic block frequencies of LL{t). Then 

(aT(a)T(2 - a)) 1,a t~ 1/a W{t) 4 X as i| 0, 
where X has the Frechet distribution of index a. 

This result suggests that there should be a whole range of block sizes between the typical size t 1 ^" -1 ) 
and the largest block size i 1 /". This is made more precise in [3] where we analyze the precise multifractal 
nature of the Beta-coalesccnt. 

1.3. Hausdorff and packing dimensions 

Given a "P-valued coalescent process, we can define a metric donN such that 

d(i,j) = inf{£: i and j are in the same block at time t}. 

For all i,j,k £ N, we have d(i,j) < m&x{d(i,k),d(k,j)}, so d is an ultrametric on N. Let (S,d) be the 
completion of (N, d), and note that the extension of d to S is also an ultrametric. 

We now review the definitions of the Hausdorff and packing dimensions, following closely the discussions 
in [20, 21]. Let (X,d) be a metric space. For U C X, let \U\ = sup{d(x, y): x,y £ U} denote the diameter 
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of U. If is a collection of Borel sets such that U C (Ji=i Vi> then we call {V;}^ a cover of {/. If 

in addition \Vi\ < 8 for all i, then we call {Vi}?^ a 5-cover of {/. Given s > 0, the s-dimensional Hausdorff 
measure of U is 

Hs{U) = Hm f inf jl] W : is a <$- cover 01 ^| j • 

The Hausdorff dimension of U is 

dim ff ({7) = inf{s > 0: H S (£T) = 0} = sup{s > 0: H„(U) = oo}. 

If {T^liSi is a collection of disjoint open balls centered in U such that \Vi\ < <5 for all i, then we say 
a ^-packing of f7. The s-dimensional packing premeasure of U is 

P S (U) = Hm^supj^ \Vi\ s : {Vi}Zi is a ^-packing of Ujj . (6) 

The s-dimensional packing outer measure of U is then defined to be 

p s ([/)=inf J VP S (K): {V5}^! is a cover of [/ 1. (7) 



»=i 



The packing dimension of U is 

dxm P (U) = inf{s > 0: p s [U) = 0} = sup{s > 0: p s {U) = oo}. (8) 

The Hausdorff dimension of a set is always less than or equal to the packing dimension (see, for example, 
Chapter 3 of [21]). 

Evans [20] investigated the fractal properties of the metric space associated with Kingman's coalescent. 
He showed that the Hausdorff and packing dimensions are both equal to one almost surely, and that the 
metric space is capacity equivalent to the unit interval. Donnelly et al. [13] showed that, for a coalescent 
process resulting from coalescing Brownian motions on the circle, the associated completed metric space 
(S, d) almost surely has Hausdorff and packing dimensions of 1/2 and is capacity-equivalent to the middle- \ 
Cantor set. Our next result implies that the Hausdorff and packing dimensions of the metric space associated 
with the Beta(2 — a, a) coalescent with 1 < a < 2 has Hausdorff and packing dimensions equal to \/{a— 1). 
Note that again we get the correct result for Kingman's coalescent by substituting a = 2. 

Theorem 1.7. Let A be a finite measure on [0,1] satisfying the conditions of Theorem 1.1. Let (S,d) be the 
metric space associated with the A-coalescent (TL(t),t > 0). Then, the Hausdorff and packing dimensions of 
S are both l/(a — 1) almost surely. 

1.4- Dynamics of the number of blocks 

Theorem 1.1 gives an almost sure limit theorem for the number of blocks in the coalescent at small times. 
Here we consider in more detail the dynamics of the process (N(i),t > 0), for /1-coalescents satisfying the 
assumptions of Theorem 1.1. 

Let Cn,fc be the probability that, if the yl-coalescent has n blocks, then it will lose exactly k blocks at the 
time of the next merger. More precisely, let (TL n (t),t > 0) be the yl-coalescent restricted to {1, . . . , n}, and 
let N n (i) be the number of blocks of i7„(t). If T = M{t: II n (t) ^ LT n (0)}, then (n,fc = P(N n (T) =n-k). 
Note that if X n ,k is given by (1) and A„ = X)fe=2 (T)^ n - k * s * ne total merger rate when the coalescent has n 
blocks, then 



k+1 A 
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because k+1 blocks have to merge for the number of blocks to be reduced by k. It is not difficult to calculate 
(see Lemma 4 in [8] ) that 

r /■ aT(k + l-a) 

hm (n,k = jT— V- ( 9 ) 

rwoo (fc + 1)!1 (2 — a) 

If we define £fc = aT(k + 1 — a)/[(k + l)!r(2 — a)], then Y^=i Cfe = 1 an( l J2kLi k(k = V( a — -0' as snown 
in Eqs (39) and (40) of [8]. Therefore, there is a probability distribution, which we call (, on the positive 
integers corresponding to (Cfc)fcLi, and this distribution has mean l/(a— 1). Thus, at small times, when 
the number of blocks is large, the successive jumps of the process (N(t),t > 0) are approximately i.i.d. with 
distribution £. We can use a renewal argument to establish the following theorem. 

Theorem 1.8. Let A be a finite measure on [0, 1] satisfying the conditions of Theorem 1.1. Let (LT(t),t > 0) 
be the A-coalescent. Let N(t) be the number of blocks of LI(t), and let V„ be the event that N(t) = n for 
some t. Then 

Urn P(V n )=a-l. 

n — >oo 

Once again, the case a — 2 corresponds to Kingman's coalescent, where P(V n ) = 1 for all n because 
the process (N(t),t > 0) visits every integer. As a gets smaller, there are more large mergers that cause 
(N(t), t>0) to skip over some integers. 

1.5. Total time in the tree 

Given a vl-coalescent (7T(t),t > 0), consider the process (R n lT(t),t > 0), which is the coalescent restricted 
to {1, . . . , n } so the process starts with just n blocks. For k = 2, . . . , n, let Dk be the duration of time for 
which LI{t) has exactly k blocks. Then 



L n = ^2 kD k 



k=2 

is the sum of the lengths of all the branches in the coalescent tree. This quantity has biological significance 
because if the coalescent process represents the ancestral tree of a sample of n individuals from the population 
and a mutation occurs along one of the branches of this tree, then the n individuals in the sample will not 
all have the same gene at the site of the mutation. Consequently, if mutations occur at rate 9 along each 
branch and each mutation happens at a different site, then the number of "segregating sites" at which the 
n sampled individuals do not have the same gene should be approximately 0L n . 
For Kingman's coalescent, it is easily verified that 



L, 



p, 



log n 



2. 



where — > p denotes convergence in probability. Durrett and Schweinsberg [18] studied the case in which A has 
a unit mass at zero as well as a component that allows for multiple mergers. Mohle [28] obtained a recursive 
equation for the limiting distribution of n~ 1 L n under the condition J„ x~ 2 A(dx) < oo, which implies that 
the total merger rate is finite even when the number of blocks is infinite. The result below includes the 
Beta(2 — a, a) coalescents for 1 < a < 2. 

Theorem 1.9. Let A be a finite measure on [0,1] satisfying the conditions of Theorem 1.1. Let L n be as 
defined above for the A-coalescent. Then 

L n P a(a - 1) 



AT(2-a)(2-a) 
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In [3], more precise results on the structure of the population under such a model are obtained using an 
approach based on continuous random trees. 

The rest of this paper is organized as follows. In Section 2, we review some facts about continuous- 
state branching processes that we will need, and state the connection between CSBPs and Beta(2 — a, a) 
coalescents that was established in [11]. In Section 3, we record some results that will allow us to couple 
two coalescents with multiple collisions, which will be used to extend some of our results beyond the beta 
coalescents. We prove Theorem 1.1 in Section 4. Theorems 1.2 and 1.4 and Propositions 1.5 and 1.6 are 
proved in Section 5. We prove Theorem 1.7 in Section 6 and Theorems 1.8 and 1.9 in Section 7. 



2. Beta coalescents and continuous-state branching processes 



In this section, we review the results in [11] that relate continuous-state branching processes to beta co- 
alescents. Continuous-state branching processes are the continuous versions of Galton- Watson processes. 
More formally, a continuous-state branching process is a [0, oo]-valued Markov process (Z(t),t > 0) whose 
transition functions pt(x, •) satisfy 

Pt(x + y,-)=pt(x,-)*p t (y,-) for all x, y > 0. (10) 

That is, the sum of independent copies of the process started at x and y has the same distribution as the 
process started at x + y. We think of Z(t) as being the size of a population at time t, and the property (10) 
is called the branching property because it can loosely be interpreted as meaning that if we start with a 
population size of x + y, then number of offspring of the first x individuals is independent of the number of 
offspring of the remaining y. 

For each t > 0, there is a function Ut : [0, oo) — > R such that 

E[ e - xz ^\Z = a]=e- aut ^\ (11) 

If we exclude processes with an instantaneous jump to infinity the functions u t satisfy the differential 
equation 

^ = -*MA)), (12) 
where W : [0, oo) — > R is a function of the form 

poo 

&(u) =au + f3u 2 + (e~ xu - 1 + .Tul {2 ,< 1} )7r(dx), (13) 
Jo 

where a E M, f3 > 0, and 7r is a Levy measure on (0, oo) satisfying J °°(l A x 2 )7r(da;) < oo. The function & is 
called the branching mechanism of the CSBP. 

As shown in [6], one can extend the CSBP to a two-parameter process (Z(t,a),t > 0,a > 0) such that 
Z(Q,a) = a for all a > and, for all a,b > 0, the process (Z(t,a + b) — Z(t,a),t > 0) is independent of 
(Z(t, c), t > 0, < c < a) and has the same law as a CSBP with branching mechanism 'P started at b. Here, 
we think of Z(t, a) as the number of individuals at time t descended from the first a individuals at time 
zero. For fixed t, the process (Z(t, a), a > 0) is a subordinator, and it then follows from (11) that the Laplace 
exponent of this subordinator is the function A i— ► it* (A). 

Along the same lines, one can work with a measure- valued process (Mt,t > 0) taking its values in the set 
of finite measures on [0, 1] such that (M t ([0, a]), t > 0, < a < 1) has the same finite-dimensional distributions 
as (Z(t,a),t > 0,0 < a < 1). Now (M t ([0,a]),0 < a < 1) is a subordinator with Laplace exponent A i— ► u t (X) 
run for time 1, and if we set Z(t) = Mt ([0,1]), then (Z(t),t > 0) is a CSBP with branching mechanism 
\P started at 1. An explicit construction of (M t ,t > 0) can be given using the lookdown construction of 
Donnelly and Kurtz [14]. See also Section 2 of [11] for a review of this construction in the /3 = case. In [3], 
a construction of this process is obtained in terms of continuous stable random trees. 
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For the purposes of studying Bcta(2 — a, a)-coalcscents with 1 < a < 2, we will consider CSBPs that have 
the stable branching mechanism i'(A) = A Q , where 1 < a < 2. In this case, the Levy measure is given by 

1 (2 — a) 

(see, for example, Example 4 of [19]). Birkner et al. [11] showed that after a time change, the genealogy of 
this CSBP can be described by the Beta(2 — a, a)-coalescent. The full construction of the beta coalescent 
relies on the lookdown construction of Donnelly and Kurtz [14]. We describe here an identity involving 
one-dimensional distributions which will be sufficient for the applications in this paper. We assume that 
Zit) = Mt([0, 1]), where (M t , £ > 0) is the measure- valued process defined above. To define the time change, 
for all £ > let 



R(t) 



: a(a - l)r(a) / Z(s) 1_a ds, 
Jo 



and let i? _1 (£) = inf{,s: R(s) > £}. Note that in [11], the time change is given only up to a constant in 
Theorem 1.1, but one can determine the exact constant, for example, from the proof of Lemma 3.7 in [11]. 
Theorem 1.1 of [11] states that the process (M R -i^/Z(R~ 1 (t)),t > 0) has the same law as the /1-Fleming- 
Viot process introduced in [7], where A is the Beta(2 — a, a) distribution. The following lemma then follows 
immediately from the duality discovered in [7] between the vl-Fleming-Viot process and the /1-coalescent. 

Lemma 2.1. If (iT(£), £ > 0) is a Beta(2 — a, a)-coalescent and (6>(£), £> 0) is the associated ranked coales- 
cent, then for all t > 0, the distribution of 0(t) is the same as the distribution of the sizes of the atoms of 
the measure M R -i^/ Z(R~ 1 (t)) , ranked in decreasing order. 

Lemma 2.2 below describes the number and sizes of the atoms of M t , when the CSBP has the stable 
branching mechanism •/'(A) — X a . This result, in combination with Lemma 2.1, will be the key to using 
continuous-state branching processes to get information about the number and sizes of the blocks of beta 
coalescents. Note that, as will be seen from the proof, the Levy measure of the subordinator A i— > u t (X) is 
finite for all £ > 0, so M t has only finitely many atoms. 

Lemma 2.2. Assume ^(A) = A Q . Let D(t) be the number of atoms of Mt, and let J it) = (Ji(£), ■ ■ ■ , JD(t){t)) 
be the sizes of the atoms of M t , ranked in decreasing order. Then Z?(£) is Poisson with mean 8 t = 
[(a — l)£] _1 /' a_1 ^ . Conditional on D(£) = k, the distribution of Jit) is the same as the distribution of 
i6^ 1 Xi, . . . , 9^ 1 Xk), where Xi, . . . , Xk are obtained by picking k i.i.d. random variables with distribution [i, 
and then ranking them in decreasing order. 

Proof. When \P(X) = X a , it is possible to solve (12) explicitly with the initial condition ito(A) = 1, and we 
get 

Ut (A) = [(a-l)£ + A 1 -«]- 1/(Q " 1) 

(see Eq. (2.15) of [26]). The sizes of the atoms of M t are precisely the sizes of the jumps of a subordinator 
with Laplace exponent A i— ► Ut(X) run for time 1. The number of atoms of M t is the number of jumps of this 
subordinator, which has the Poisson distribution with some mean 9 t . Note that since lim^oo A _1 u t (A) = 0, 
the subordinator has no drift (see the formula at the bottom of p. 72 in [4]). Therefore, P{Zit) = 0) = e~ 9t . 
Using (11) with a = 1 and the Monotone Convergence Theorem, 

P(Z(t) = 0)= lim E[c- xz{t) } = lim e~ UtW . 

X — >oc A — *oo 

It follows that 

6 t = lim Ut (A) = [(a-l)£]- 1/( "- 1) . (14) 

A — ►oo 
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Since 6 t < oo, the subordinator has jumps at only a finite rate, and the sizes of the jumps are nonnegativc 
i.i.d. random variables, whose distributions can be read from the Laplace exponent of the subordinator. 

To obtain the distribution of the jump sizes, let [it be the distribution of 9^ 1 X , where X has distribution 
\i. Then 

jf°(l - c- Xx )6 m {Ax) =e t (l- jf °° e- A > t (da;)^ =6 t (l- e~ xe ^ x fi(dx)\ 

= e t (i + (xe^) 1 - a r 1/{a - r) = u t (\). 

It follows that 6 t n t (dx) is the Levy measure of the subordinator, and therefore the subordinator has jumps 
with size distribution fj, t at rate 9 t - This implies the lemma. □ 

Note that since the number of atoms in M t represents the number of individuals at time zero that have 
descendants alive at time t, the number D(t) of atoms of M t is almost surely a decreasing function of t. 
This is clear, for example, from the construction in [11]. Furthermore, as a consequence of the branching 
property of CSBPs, if 

D(t) 

M t = J2Ji(t)S ai , 

i=l 

where 5 ai denotes a unit mass at a*, then conditional on (M s ,0 < s < t), the processes (M t + S {{cti}), s > 0) 
for i = 1, . . . , D(t) have the same joint law as D(t) independent CSBPs with branching mechanism ^ started 
from Ji(t), . . . , J D ^{t). Also, almost surely M t+S ({a}) = for all s > and a ^ {ai, . . . , a D ( t )}- 

Finally, we recall that every CSBP can be obtained as a time-change of a Levy process with no negative 
jumps, as shown in [27, 34]. Given & as in (13), let (Y(t),t > 0) be a Levy process such that Y(0) = a 
and E[c' XY ^} = q-^+^W, Define (Y(t),t> 0) to be the process (Y(t),t> 0) stopped when it hits zero. 
Let U(t)=m£{s: f* f(u)- 1 du > t}. Then, if (Z(t),t>0) is a CSBP with branching mechanism \P and 
Z(Q) = a, the processes (Z(t),t > 0) and (Y(U(t)),t > 0) have the same law, if we adopt the convention that 
Y(oo) = oo. 

3. Coupling of coalescent processes 

To extend our results for the Beta(2 — a, a)-coalescents to other /1-coalescents, it will be important to have 
techniques for coupling two coalescents with multiple collisions. To carry out this coupling, we will use the 
Poisson process construction of /1-coalcsccnts introduced by Pitman [30]. For simplicity, we assume that 
vl({0}) = 0, which will be the case in our examples. 

Let Q x denote the distribution of an infinite sequence of {0, l}-valued random variables that are one 
with probability x and zero with probability 1 — x. Let L be the measure on {0,1}°° such that L(B) = 
f Q Q x (B)x~ 2 A(dx) for all measurable sets B. We will construct the yl-coalescent from a Poisson point 
process on [0, oo) x {0,1}°° with intensity measure dt x £(d£). To do this, we first fix a positive integer 
n and construct a Pn-valued process (n n (t),t > 0). We set 77 n (0) to be the partition of {l,...,n} into 
singletons. If (i,£) is a point of the Poisson process and B\, . . . ,Bb are the blocks of n n (t—), ranked in order 
by their smallest clement, then we define n n (t) to be the partition obtained from n n (t—) by merging all of 
the blocks Bi such that & = 1, where we write £ = (£i,£2> • • •)• Since A is a finite measure, it is easy to verify 
that for any fixed t, there are only finitely many points (s,£) such that s < t and at least two of £i, . . . ,£„ 
equal one. Consequently, the process (II n (t),t > 0) is well defined. Furthermore, these processes are defined 
consistently for different values of n, which means there exists a unique V- valued process (77(i),i > 0) such 
that n n (t) = R n n(t) for all n and t. The process (n(t),t > 0) is the yl-coalescent, as shown in [30]. 

Below are the two coupling lemmas that we will use. Lemma 3.1 allows us to restrict our attention to the 
behavior of A in a neighborhood of zero when we are concerned with small-time asymptotics of /1-coalescents. 
This result appears implicitly in [32], but we give the short proof for completeness. Lemma 3.2 will allow us 
to compare other yl-coalescents to beta coalescents. 
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Lemma 3.1. Suppose A\ and A% are finite measures on [0, 1] such that A\{{Q\) = /^({O}) = and, for 
some 8 > 0, the restriction of Ai to [0,6] equals the restriction of A 2 to [0,(5]. Then there exist V -valued 
processes (LIi(t), t > 0) and (7T 2 (t) , t > 0) such that Ll\ is a A\-coalescent, 27 2 is a A2~coalescent, and for 
some random time t > 0, we have LIi(s) = /7a (s) for all s <t. 

Proof. For i = 1,2, let A\ be the restriction of Ai to (5,1], and let A' 3 be the restriction of Ai to [0,5]. 
Let tf^, an( i ^3 be independent Poisson point processes on [0,oo) x {0,1}°° such that has intensity 
dt x Lj(d^), where Li(B) = J Q Q x (B)x~ 2 A' i (dx) for all measurable B. Let "Z'i be the Poisson point process 
consisting of all points in and i'g, and let $2 be the Poisson point process consisting of all points in 
&2 and ^3. For i = 1,2, let (LIi(t),t > 0) be the V- valued coalescent process obtained from ^ as described 
above. Then (LIi(t),t > 0) is a /L-coalescent and (ri2(t),t > 0) is a /^-coalescent. 

For i = 1,2, the total mass of L^ is f^ 1 a;~ 2 Ai(d:z:) < 5~ 2 Ai([S, 1]) < 00. Thus, for i = 1,2, if we define 
i; = min{s: (s, £) is a point of then ij > 0. Therefore, if t = min{£i, £2}, then the restrictions of $1 and 
&2 to [0, t) x {0, 1} 00 are the same. It now follows from the construction that -Hi(s) = -^(s) for all s < t. □ 

Lemma 3.2. Suppose A\ and A2 are finite measures on [0,1] suc/i £/ia£ /li({0}) = ^2 ({0}) = and A\{B) > 
yl 2 (i?) for all measurable B. Then there exist V-valued processes (LIi(t),t > 0) and (Ll2(t),t > 0) smc/i i/iai 
ill is a yli -coalescent, TI2 is a A2- coalescent, and N\(t) < N2{t) for all t>0, where Ni(t) is the number of 
blocks ofn t {t) fori = 1,2. 

Proof. For i = 1,2, let Li be the measure on {0, 1} 00 such that Li(B) = J Q Q x (B)x~ 2 Ai(dx) for all measur- 
able B. Let L^(B) = Li(B) — L 2 (£>) > for all measurable _B. Let $2 and ^3 be independent Poisson point 
processes with intensities dt x L 2 (d£) and dt x L 3 (d£), respectively. Let <Pi be the Poisson point process 
consisting of all points in <?2 and Ufa, which has intensity dt x Li(d£). For i = 1, 2, let (LIi(t),t > 0) be the 
P-valued coalescent process obtained from ^ as described above. Then (LTi(t),t > 0) is a /L-coalescent and 
(Tl2{t),t > 0) is a /^-coalescent. 

For i = 1,2, let 2Vj,„(t) be the number of blocks of R n LI t (t). To show that Ni(t) < N 2 (t) for all t > 0, 
it suffices to show that Ni, n (t) < N2, n {t) for all positive integers n and all t > 0. Each point of <?2 is 
also a point of <?i. Suppose (£,£;) is a point of ^2, and let A t = {i: £j = or = for all j < i}. Then 
Ni, n (t) is the cardinality of At fl {1, . . ., Ni >n (t— )} and N 2 , n {t) is the cardinality of A t n {1, . . . , iV 2 ,„(i— )}. 
Therefore, if iVi jT ,(i-) < N 2 , n (t-), then A^i,„(i) < N 2 , n (t)- Since iV 1; „(0) = iV 2 ,n(0), the result follows from 
the construction and the fact that the restricted processes (R n LTi(t),t > 0) and (R n Tl2(t),t > 0) have only 
finitely many jump times. □ 

4. Number of blocks 

In this section, we prove Theorem 1.1. Throughout this section, we assume that (Mt,t > 0) is the measure- 
valued process defined in Section 2 and that R(t) and i?^ 1 (i) are as defined in Section 2. Since Lemma 
2.2 gives the number of atoms of M t , the key step involves analyzing the time change, which will make it 
possible to relate the number of blocks of the Beta(2 — a, a)-coalescent to the number of atoms of M t . 

Lemma 4.1. Suppose (Y(t),t > 0) is a Levy process such that Y(0) = and E[e~ XY ^] = e t!fr ( A \ where 
^(A) = \ a for some a £ (1, 2). There exists a constant C such that for all t> and e > 0, we have 



Proof. Scaling properties of Levy process imply that for all k > 0, the processes (Y(t),t > 0) and 
{k~ 1 / a Y{kt),t > 0) have the same law. By taking k = l/t, we get 




0<s<t 



—a 




(15) 
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It is known for Levy processes with the above scaling property (see Exercise 2 in Chapter VIII of [4]) that 
there exists a constant C± such that 



P 



( sup \Y{s)\ >x) ~C lX - a , (16) 



where ~ means that the ratio of the two sides tends to 1 as x — > oo. The lemma follows from (15) and 
(16). □ 

Lemma 4.2. There exists a constant C such that for all t> and e > 0, we have 

\a(a — 1)1 [a) ot(a — 1)1 (a) J 

Proof. Let (Z(t),t > 0) be a CSBP with branching mechanism ^(A) = A" such that Z(0) = 1. Since every 
CSBP can be obtained via a time change of a Levy process with no negative jumps, as explained at the end 
of Section 2, we may assume that there is a Levy process (Y(t),t > 0) satisfying Y(0) = 1 and E[e~ XY ^] = 
e -A+t*-(A) such that Z (t) = Y(U(t)) for all t, where U{t) = infjs: F(u)- 1 du > t} and (Y(t), t > 0) is the 
process (Y(t),t > 0) stopped when it hits zero. 

Assume e < 1/2, and let K = 4/[a(a - l)T(a)]. Assume that \Y(s) - 1| < e for all s E [0,Kt], which 
happens with probability at least 1 — Cte~ a for some constant C by Lemma 4.1. We then have (1 — e)s < 
U{s) < (1 + e)s for all s G [0,Kt/2]. Since 

R(s)=a(a-l)T(a) f Y(U(r))^ a dr 
Jo 

for all s, it follows that 

a(a - l)r(a)(l + e) 1_Q s < R(s) < a(a - l)r(a)(l - e) 1 ""s 
for all s e [Q,Kt/2]. Therefore, 



a(a-l)r(a) - w ~ a(a - l)r(a) ' 
Since 1 - e < (1 - e)"" 1 and (1 + e)"" 1 < 1 + e, the lemma follows. □ 

Lemma 4.3. Let (77(i),i>0) be the Beta(2 — a, a) coalescent, where 1 < a < 2, and let N(t) be the number 
of blocks of n(t). There exists a constant C depending on e such that for all t>0, we have 

P((l - e)(aT{a)) 1/{a ~ 1] < i 1 ^"" 1 ) 7V(i) < (1 + e)(ar(a)) 1/(Q_1) ) >1-Ct. 

Proof. Recall that (M t ,t> 0) is the measure- valued process defined in Section 2. Let D(s) be the number 
of atoms of M s . By Lemma 2.2, the distribution of D{s) is Poisson with mean 6 S = [(a — l)s] -1 ' ( ct_1 ) ) so 
E[D(s)] = Var(L>(s)) = 6 S . By Chebyshev's Inequality, we have P(\D(s) -6 S \> S6 S ) < l/(S 2 6 s ). Therefore, 
if 5 is small enough that l+e>(l + 5)(l- and 1 - e < (1 - <5)(1 + ) then 

^^) >^+ ' 1( • ^( '» VMr, *■ , ') ^ K^)* v,l (17) 



and likewise 



KKs(i^) <(1 - e)< " r( " ))1/MrV< ""04(irR) fi/l °- i> - < i8 > 
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Since N(t) has the same distribution as D(R~ 1 (t)) by Lemma 2.1 and D(t) is a decreasing function of t, 
the result follows from (17), (18), and Lemma 4.2. □ 

We are now ready to prove Theorem 1.1 for the Beta(2 — a, a)-coalescents. 

Proposition 4.4. Let (LT(t),t > 0) be the Beta(2 — a,a)-coalescent where 1 < a < 2, and let N(t) be the 
number of blocks of LT(t). Then 

limt^-^Nit) = (aL(a)) 1/(Q " 1) a.s. 
tio 

Proof. Let £ > 0. Fix t > 0, and let tj = t(l — e) J for j = 0, 1, 2, Let B be the event that for all j 7 we 

have 

(l-e)(ar(a)) 1/(a_1) < ^-^JV^-) < (1 + e)(ar(a)) 1/(Q_1) . (19) 

By Lemma 4.3, there is a constant C depending on e such that 

00 

P(5)>l-C^i(l-eP =l-Ce~H. (20) 

3=0 

Suppose B occurs and < s < t. Then for some j, we have tj + \ <s <tj, which implies that N(tj) < N(s) < 
N(tj+i), since the number of blocks is a decreasing function of time. From (19) and the definition of the tj, 
we get 

(1 - ^W/^-^alXa)) 17 ^- 1 ) < s^-VNis) < (1 + e)(l - £ )- 1 /^- 1 )(ar(a)) 1/(a! - 1) . 
Letting 1 1 and using (20), we get 

liminf *V(«-i)jv(t) > (1 - £ ) 1+1 /^- 1 )(ar(a)) 1/(Q " 1) a.s. 
tj.o 



and 



limsupt 1 /^-^ N(t) < {l + £){l-e)~ 1/(a - 1) (aT{a)) 1/{a ~ 1} 
tio 



a.s. 



Letting e | completes the proof. □ 

Remark 4-5. Note that one can not conclude Proposition 4-4 simply by combining the facts that that D(t) 
is asymptotically equivalent to 6 t as t j. and that R (t) is asymptotically equivalent to t/[a(a — l)T(a)] 
as 1 10. The more involved argument in Proposition 4-4 * s necessary because Lemma 2.2 only establishes an 
equality in distribution at individual times. Consequently, almost sure results about the CSBP as t [ do not 
immediately translate to the coalescent. 

Proof of Theorem 1.1. Let e > 0, and then choose 5 > such that (A — s)x 1 ~ a (l — x)"^ 1 < f(x) < 
(A + e)x 1 ~ a (l — x)" -1 for all x £ [0, 5]. Let Aq be the finite measure on [0, 1] with density f(x)lj x <s\. Let 
Ai be the finite measure on [0, 1] with density (A — e)x 1 ~ a (l — a-) Q_1 l{ a .<5i., and let A2 be the finite measure 
on [0,1] with density (A + e)a; 1 ^ Q (l - x)^ 1 !^^}. For i = 0,1,2, let (I7j(i),t>0) be a ^-coalescent, and 
let Ni(t) denote the number of blocks in LTi(t). 

Note that if (LT(t),t > 0) is a yl'-coalescent and C is a constant, then (LT(Ct),t > 0) is a CM'-coalescent. 
This fact, combined with Proposition 4.4 and Lemma 3.1, gives 



limt^^^NUt) -- . 
U0 w \(A-e)T{2-a) 



VO-i) 



a.s. 
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and 



limt 1/{a ~ r) N 2 (t) 



i/O-i) 



a.s. 



no w V(^ + £) r ( 2 -a), 

By applying Lemma 3.2 to /lo and A±, we get 

/ \V(«-i) 
limsup^-^JVoW < 71 rf^ r a.s. (21) 



tj.o 



{A-e)Y(2-a) 



Likewise, by applying Lemma 3.2 to ylo and A2, we get 

/ n \V(«-i) 
liminft 1 ^ - 1 ) iV (t) > 7-: r-7 r a.s. (22) 



Uo uw " \(A + e)T{2~u) 

The conclusion of the theorem for the /lo-coalesccnt now follows by letting e { in (21) and (22). The 
conclusion for the original /1-coalescent then follows from Lemma 3.1. □ 



5. Block sizes 



Our goal in this section is to prove Theorems 1.2 and 1.4 and Propositions 1.5 and 1.6, all of which pertain 
to the sizes of the blocks in the coalescent. 

5.1. One-dimensional distributions 

We first prove Theorem 1.2. Although Theorem 1.2 is stated for a € (1,2), the proof also works for a = 2, 
so we get an alternative proof of the fact that for Kingman's coalescent, 0{T k ) is uniformly distributed on 



Proof of Theorem 1.2. Let X k = ^2j=2 (j)^fcj De * ne total rate of all mergers when the coalescent has k 
blocks. Let 14 = {N(Th) = k} be the event that at some time the coalescent has exactly k blocks. Conditional 
on Vfc, the amount of time for which the coalescent has k blocks has an exponential distribution with mean 



\ k , and this time is independent of 0(Tk). Therefore, if B is a measurable subset of Ak 



then 



P(0(T k ) e B\V k ) = 



P({Q(T k ) e B} nv k ) 
P(V k ) 



P{Vk) 



E 



L{Ar(t)=fe,e(t)es} 



(23) 



Let (Z(t),t > 0) be a CSBP with branching mechanism 'P(X) = A", obtained from the measure-valued 
process (M t ,t > 0) as in Section 2. Let D(t) be the number of atoms of M t , and let J(t) = (Ji(t), . . . , Ju(t)(*)) 
be the sequence consisting of the sizes of the atoms of M ti ranked in decreasing order. Let J*(t) = on 
{Z(t) = 0}, and let J*{t) = J(t)/Z(t) on {Z(t) > 0}, so the terms in the sequence J*(t) sum to one for 
all t such that Z{t) > 0. By Lemma 2.1, the distribution of (N(t),0(t)) is the same as the distribution 
of (D(R~ 1 (t)),J*(R^ 1 (t))). Combining this result with Fubini's theorem and then making the change of 
variables s = i? _1 (t), we have 



E 



l{jv(t)=fe,e(t)eB} dt 



= E 



E 



1{D(R~ 1 (t))=k,J* (R _1 (i))6B) df 



a(a - l)T(a)Z(s) a l{D(s)=k,j*(s)&B} & 



(24) 
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Recall from (14) that 9 S = [(a - l)^ -1 /^" 1 ) and therefore 6\- a = (a - l)s. Therefore, (23) and (24) imply 

X k aT(a) 



P(0(T k ) £ B\V k ) 



P(V k ) 
\ k aT(a) 
P(V k ) 



E 



s-'iOsZis)) 1 -*! 



{D(s)=fe,J*(s)GB} 1 



s- 1 P{D(s) = k)E[(6 s Z(s)) 



l-a. 



47*( s )ei3} 



eB} \D(s)=k]ds. 



Recall that Xi, . . . , X k are obtained by picking k i.i.d. random variables with distribution fx, and then 
ranking them in decreasing order. Also, recall that Sk = X% + ■ ■ ■ + X k . By Lemma 2.2, the conditional 
distribution of J(s) given D(s) = k is the same as the distribution of (O^Xi, . . . ^O^Xk). Because Z(s) = 
Ji(s) + • • • + J_d( s )(s), it follows that the joint distribution of (8 s Z(s), J*(s)), conditional on D(s) = k, is the 
same as the joint distribution of (S k , {X\/ S k , . . . ,X k /S k )). Note that this distribution is the same for all s. 
Therefore, for g = 1b, we have 



P{9(T k ) e B\V k ) 



A fc o:r(a;) 



l P(D(s) = k)E 



el — ex. 

S k 9 



s k ' 



Xk 

s k 



d.s 



CE 



s k 9 



•'a(^ ± 



\ S k ' ' S k 



(25) 



where C = A fc ar(a)P(V fc )" 1 J °° s" 1 P(D(s) = k)ds. By taking B = A k so that both sides of (25) equal 
one, we get C = E[Sl~ a ]- 1 . This establishes (5) when g is an indicator function. The result for arbitrary 
nonnegative measurable g now follows from the linearity of expectation and the Monotone Convergence 
Theorem. □ 



5.2. Block sizes at small times 



Our next goal is to prove Theorem 1.4. Our first lemma bounds the fluctuations of a continuous-state 
branching process for small times. 

Lemma 5.1. There exists a constant C such that for all a> 0, t > ; and e > 0, if {Z(t), t > 0) is a CSBP 
with stable branching mechanism tf'(A) = A Q with Z(0) = a, then 

q(a,t,e)=p( sup \Z(s)-a\ > s) < C(a + e)te^ a . 

In particular, for any constants C\, C%, and C3 , we have 
limq(C 1 t 1 / a ,C 2 t,C 3 t 1 / a ) =0. 

Proof. As in the proof of Lemma 4.2, we may assume that there is a Levy process (Y(t),t > 0) satisfying 
Y = a and E[c~ XY ^} = c -^+t*m sucn that Z{t) = Y(U(t)) for all t, where U(t) = vaf{s: j° F(u)- 1 du > 
t}, and (Y(t), t > 0) is the process (Y(t),t > 0) stopped when it hits zero. If \Y(s) — a\ <e for < s < (a + e)t, 
then U(s) < (a + e)s for all s < t, and therefore, \Z(s) — a\ = \Y(U(s)) — a\ < e for all s < t. The result now 
follows from Lemma 4.1. □ 

Lemma 5.2. Let (LT(t),t > 0) be the Bcta(2 — a, a) coalescent, where 1 < a < 2, and let N(t,x) be the 
number of blocks of n{t) whose asymptotic frequency is at mo st x. Let 7 = (alXa)) 1 /^- 1 ) . There exists a 
constant C depending on e and x such that for all t > 0, we have 



P((l - e)7-F((l - e) 1X ) < £ x /(«-i) N(t, t^^-^x) < (1 + e)7-F((l + e)^x)) > 1 - Ct 1 ' 2 . 
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Proof. Fix t > and e > 0. Define t- = (1 - t 1 / 2a )t/[a(a - l)T(a)] and t+ = {I +t 1 / 2a )t/[a{a - l)T{a)\. 
Let Bi tt be the event that t- < < t+. Let B 2 ,t be the event that \Z(s) - 1| < i 1/2a for all s < 

i+. Let A 3 = {a: < M s ({a}) < (1 - f 1 / 2 ' 1 )^/^-!)^ for all s G and let A 4 = {a: < M s ({a}) < 

(1 + t 1 / 2a )f 1 /( a_1 )x for some s e [t-,t + ]}. Then, letting #S denote the cardinality of the set S, we define 
B 3}t to be the event that #A 3 > (1 - e)t- 1 ^ a - 1 ^F((l - e)^x) and B A>t to be the event that #A 4 < 
(1 + e)f- 1 /("- 1 ) 7j F((l + e)7ar). 

By Lemmas 2.1 and 2.2, the distribution of N(t, i 1 /^ 0-1 ^) is the same as the distribution of the number 
of terms of the sequence J(R~ 1 (t))/Z(R~ 1 (t)) that are in (O,* 1 ^" -1 ^]. Furthermore, note that if Bi t 
occurs for i = 1,..., 4, then the number of terms of J (R~ l {t)) / Z {R- 1 (t)) in (0, t 1 ^ a ~^x] is at least (1 - 
e)i-V(«-l)-yjr((i _ £ )-ya;) a nd at most (1 + e)t _1/(Q_1) 7F((l + e)7x). Therefore, to prove the lemma, it 
suffices to show that there exists a constant C such that P(B\.t (~l i?2,* D i?3,t PI i?4,t) > 1 — Ct 1 ! 2 . We get 
P(Bi.t) > 1 — Ci 1 / 2 from Lemma 4.2 and P(i?2,t) > 1 — Ct 1 / 2 from Lemma 5.1. It remains to consider B^ t 
and B^t. 

We first bound P(-B 3)t ). By Lemma 2.2, if we let 6 t = [{a - l)^ 1 ^"" 1 ), then for all i > and x > 0, the 
number of atoms of M t with size at most 8^ 1 x has the Poisson distribution with mean 8tF(x). Therefore, 
the number of atoms of M t _ with size at most (1 — e)t x K a ~' i -'x has the Poisson distribution with mean 

0t_ F(0 t _ (1 - ^tVC-Ds) = ^ a] ! (a) J j F((l - ii/aa)-V(-D (1 _ e)ryx) 

> t -V(«-i) 7 ir((i - £ ) 7 x). 

For sufficiently small t, we have (1 - t^a^i/ta-i)^ - (1 - e)t 1 /("-i) x > ( e /2)tV(«-i) x . For such t, the 
Markov property implies that conditional on < Mt_({a}) < (1 — e)t 1 / ( - Q ~ 1 - ) x, the probability that < 
M s ({a}) < (1 - i 1 /2a)ti/(«-i) x for a n s e [i_,f + ] is at least 

i- 9 ((i- e )^- i )x ) i + -t_Y|) t i /(-i) a; ), 

where g is the function defined in Lemma 5.1. By Lemma 5.1, there is a constant C such that 

q ((1 - e^-Vx, t+ - 1_ , (|) t^-^x) < Ct^ 2a e- a , 

which for sufficiently small t is at most e/2. Thus, for sufficiently small t, the cardinality of A3 has the 
Poisson distribution with mean at least 

(l-i\t-^-^F((l-s) 7 x). 

Chcbyshev's Inequality now implies that there is a constant C such that for sufficiently small t, we have 
P(B s ,t) ^l-Ce- 2 ^- 1 ). 

We now need to bound P(B^t)- The number of atoms of M t+ with size at most (1 + e)t l /^ a ^x has the 
Poisson distribution with mean 

t+ F(0 t+ (i + e)^- 1 ),) = ( pr (a ) ) + + £)lx) 

< t-V(a-i) 7i r((i + e ) 7 ar). 

For sufficiently small t, we have (1 + e)* 1 /^ -1 ^ - (1 + t 1 /2a)ti/(«-i) a; > ( £ /2)t 1 /( Q - 1 )x. For every value 
of a such that < M s ({a}) < (1 + 1 1 ^ 20 )^^ ^ 1 ^ x for some s G [t_,i+], we can apply the strong Markov 
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property at the time inf{s > i_: M s ({a}) < (1 + i 1 / 2 ")^/^-!)^} to sec that conditional on < M s ({a}) < 
^ +t x/2a^i/( Qi -i) 3 . for gome s g [t_^t + ], the probability that < M t {{a}) < (1 + e)t 1/(a_1) a; is at least 



1 - q((l + t 1 / 2a )t 1 /^x,t + - t_, f|)* 1/(o,-1) a!) 



> 1 - C*i 1/2Q e" Q > 



l + e/2 



for sufficiently small t. It follows that for sufficiently small t, the cardinality of A t} 4 has a Poisson distribution 
with mean at most 

(l + 0i~ 1/(Q " 1) 7^((l + e)7^)- 
The desired lower bound on P(B t A) now follows as before from Chebyshev's Inequality. □ 

Proof of Theorem 1.4. The proof is now similar to the proof of Proposition 4.4. Fix x > 0. We will first 
show that 

limt 1 /^- 1 )]^ 1 /^- 1 ^ = (ar(a)) 1/(Q_1) J F 1 ((ar(a)) 1/(Q " 1) x) a.s. (26) 

Let e > 0, and let t > 0. Let tj = t(l — e) 3 for j = 0, 1, 2, Let B be the event that for all j, we have 

(1 - ehF((l - e)yx) < ty^-^N^ty^-^x) < (1 + eb^((l + eh*). (27) 

where 7 = (aF(a)) 1 ^ Q ~ 1 - ) as in Lemma 5.2. By Lemma 5.2, we have P(B) > 1 — Ct 1 ! 2 for some constant C 
which depends on e. 

The number of blocks in the coalescent with asymptotic frequency at most x can only decrease as a 
result of mergers. Therefore, for each fixed x, N(t,x) is a decreasing function of t, so if tj+\ <s<tj, then 
N^tj^y^-^x) < N(s, s 1 /^-!)^) <iV(i,- + i,sV("-i)a;). It follows that if B occurs, then 

Let -B' be the event that B occurs and also that (27) holds for all j > with (1 — e) 1 /'" -1 ^ and (1 — 
place of x. Note that P(B') > 1 — Ct 1 / 2 , where C is a constant which depends on e. Now if 
B' occurs, then for all s < t, we have 

(1 - £ ) 1+1 /(^D 7 F((1 - £ ) 1+1 /(«-i) 7a; ) < s V(«-i)iv( Sl S V(«- D x ) 

< (1 +e)(l - e)- 1 /( Q - 1 ) 7 F((l + e)(l - e)- 1 ^^^). 

To obtain (26) by letting t J. and then £ j as in the proof of Proposition 4.4, it remains only to show 
that F is continuous or, equivalently, that fi has no atoms. This was proved in [8] for a measure that can 
be obtained by a rescaling of /z, and we use the same argument here. Suppose fJ-({b}) > 0. By Lemma 2.2, 
we have P(D(t) = 1) > for all t > 0, and therefore [(a - l)*] 1 /^ -1 ^ is an atom of the distribution of Z(t). 
It then follows by applying the Markov property at time 1 — t that [(a — l)^ 1 ^" -1 ^ is an atom of the 
distribution of Z{\) for all t G (0, 1], which is a contradiction. 

It remains to establish that the convergence in (26) is uniform in x. Let e > 0, and choose N > 2/e. Since F 
is continuous, we can choose x\, . . . , xn-i such that F{{aY{a)) 1 ^ a ~ 1 ^ Xi) = i/N for all i. Also set xq = and 
xjv = 00. By (26), almost surely for sufficiently small t we have \(aT(a))- 1 ^ a " 1 H 1 ^ a - 1 '> N(t, t^^-^Xi) - 
i/N\ < e/2 for i = 0, . . . , N. For such t, we have 

sup|(«r(a))~ 1/(Q - 1) t 1 /( Q ' 1 )iV(i,t 1 /( Q - 1 ^) - F((aT(a)) ina - 1) x)\ < e, (28) 

a:>0 

and the lemma follows. □ 
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5.3. Size of the block containing 1 



We now work towards proving Proposition 1.5, which concerns the distribution of the size of the block 
containing 1 or, equivalently, the distribution of the size of a size-biased pick from the blocks of the coalescent. 
We will deduce Proposition 1.5 from Theorem 1.4. We first review some facts about size-biased distributions. 
If X = (X\,X2, . • .) is a sequence of nonnegative random variables whose sum is 1, then a size-biased pick 
from the sequence X is a random variable Xn such that P(N = n\X) = X n . If (0(t),t > 0) is a ranked 
yl-coalescent with proper frequencies and K (t) is the size of the block containing 1 at time t, then K(t) is 
a size-biased pick from the sequence 0(t). 

If X is a nonnegative random variable with finite mean, then the size-biased distribution of X is the 
distribution of the random variable X, where 



E[f(X)] 



E[Xf(X)] 
E[X] 



(29) 



for all nonnegative measurable functions /. The next lemma records two facts about size-biased distributions 
that we will use in the proof of Proposition 1.5. 

Lemma 5.3. Suppose X is a nonnegative random variable with mean 1. Let X be a random variable having 
the size-biased distribution of X . Then, for all y>0, 



P{X<y)= / (P(X <y) - P(X <x))dx. 
Jo 

Let 0(A) = E[e- xx }. Then for all X > 0, 
E[e- X *) = -<P'(\). 

Proof. Let /i denote the distribution of X. By (29) and Fubini's theorem, 



(30) 



(31) 



P(X<y)=E[Xl {x < y} ]= / zn(dz) = 



o Jo 



dec n(dz) 



/i(dz) dx, 



o Jx 



which leads to (30). To prove (31), note that (29) gives E[c^ xx ] = E[Xc~ xx ]. It is also easily verified that 
— </>'(A) = E[Xc~ xx ], where we can use the dominated convergence theorem to interchange differentiation 
and expectation because E[X] < oo. □ 

Proof of Proposition 1.5. For t > and e > 0, let At >e be the event that (28) occurs. Let (0(t), t > 0) be 
the ranked coalescent process associated with (LT(t),t > 0), and write 0{t) = {0\{t),02{t), . . .). Since K{t) 
is a size-biased pick from 0{t), we have, for all y > 0, 



P(K(t) < y\A t>e ) = E 



E 



t)<y} 



A 



t,s 



i=l 
V 



(N(t,y)-N(t,x))dx 



= E 



E 



l{x<Oi(t)<y} 



At 



Therefore, letting 7 = (ar(a)) 1 /' 1 \ we get 
P{ 1 t- 1 / {a -^K{t)<y\A t , e )=E 



(N(t^-H 



-l+l/(a-l),A _ 



y)-N(t,x))dx 



At 



E 



1 -H 1 l^- 1 \N{t,^-H 1 ' {a -^y)-N(t^-H 1 ' {a -^z))di 



At, 
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Let Y be a random variable with the distribution pi, and let Y have the size-biased distribution of Y. By 
applying the definition of At, e and then Lemma 5.3, we get that there is a number 9 such that — 2ye <9< 2ye 
and 

rv 

Plyt-V^-VKit) < y\A t>e ) = 6+1 (F(y) - F(z)) dz = 9 + P(Y < y). 

Jo 

Now, fix 5 > 0. Choose e small enough that 2ye < 5/2, and then choose t small enough that P(A t , £ ) < 5/2, 
which is possible by Theorem f .4. Then 

\P{^t- l ^ a -^K{t) <y) — P(Y < y)\ < 5. 

It follows that as t J, 0, we have jt~ 1 '^ a ~ 1 'K(t) — >d Y. The proposition now follows from Lemma 5.3, as the 
formula for the Laplace transform of X in Proposition 1.5 comes from differentiating the right-hand side of 
(4). □ 



5.4. The largest block 



Next we prove Proposition 1.6. This will require understanding the tail of the distribution /i. A key tool will 
be the following Tauberian theorem, which comes from Theorem 8.1.6 of [10]. 

Lemma 5.4. Let X be a nonnegative random variable. For nonnegative integers n, let fi n = E[X n J. For 
A > 0, let 0(A) = E^e - ^]. If fi n < 00, let g n (X) — fi n — (— 1)"</>(™'(A), where cf>( n ' denotes the nth derivative 
of (f). Suppose L is a function that is slowly varying at infinity. If 7 = n + /3, where < /3 < 1, then the 
following are equivalent: 

s - m ~r0r/ L {\> °' M0 - <32) 

p(x> I )~ '- 1 »"^f' M ,^c 
r(i - 7 ) 

where ~ means that the ratio of the two sides tends to one. 



This leads to the following result concerning the largest atom of M t . 



Lemma 5.5. Let (Mt,t > 0) be the measure-valued process defined in Section 2. Let J\{t) be the size of the 
largest atom of M t . Then, for all x > 0, we have 

limP(Ji(t) < f^x) = e -(«- 1 )^ Q /r(2-a)^ 



Proof. Let X be a random variable with distribution [i, where the Laplace transform of /1 is given by 
(4). Since E[X] = 1, we can apply Lemma 5.4 with n = 1. Defining <f> and g\ as in Lemma 5.4, we get 
<f)'(X) = -(1 + A"- 1 )-"/^- 1 ) and therefore 

gUX) = 1 - (1 + A"- 1 )-"/'"- 1 ) ~ ^^A"- 1 , 

a — 1 

where ~ means that the ratio of the two sides tends to one as A J, 0. It follows that (32) holds if L(x) = 
l/(a— 1) for all x. Therefore, by Lemma 5.4, as x — > 00 we have 

P(X>x)~ - ^ 1 } X , - - = f - - . (33) 
v ' (a-l)r(l-a) r(2-a) V ' 



Small-time behavior of beta coalescents 



233 



By Lemma 2.2, the number of atoms of M t is Poisson with mean 6 t = [{a - l)*]- 1 /^-!), and the 

sizes 

of the atoms have the distribution of 0^ 1 X, where X has distribution fj,. Therefore, for any y > 0, the 
distribution of the number of atoms of size at least y is Poisson with mean 6tP{X > 9 t y). By (33), 

Uo V ; T(2-a) r(2-a) ' 

which implies the result. □ 

Proof of Proposition 1.6. Let (Z(t),t > 0) be a CSBP obtained from the measure- valued process (M tl t > 
0) , as defined in Section 2. Let J\ (t) be the size of the largest atom of M t . The distribution of W(t) is the same 
as the distribution of Ji(R~ 1 (t))/Z(R~ 1 (t)) by Lemma 2.1. By Lemma 4.2 and the right-continuity of Z, we 
have Z(R^ 1 (t)) — ► 1 a.s. as t — > oo. Therefore, it suffices to show that for all x > 0, if 7 = (aT(a)r(2 — a)) -1 /", 
then 

limP(Ji(i?- 1 (t)) < t^x) = e _x_, \ (34) 

We follow a strategy similar to that used in the proof of Lemma 5.2. Let e > and t > 0. Let t_ = 
(1 - e)t/[a(a - l)r(a)] and let t + = (l + e)t/[a(a - l)r(a)]. Let B l>t be the event that <_ < < 
Let Bi y t be the event that for some a € [0, 1], we have M s ({a}) > ^ft x l a x for all s G [£_,£+]. Let B 3jt be the 
event that J x (s) < -ft 1/a x for all s e [£-,*+]• Note that on the event B lit nB 2 ,fi we have Ji(i? _1 (t)) > 7t 1/ct a;, 
while on the event B\_ t H i?3,i, we have Ji(i? _1 (t)) < ^t x / a x. Also, note that lim^o P(Bi t t) = 1 by Lemma 
4.2. 

Recall the definition of q(a,t,e) from Lemma 5.1. Using the Markov property for CSBPs at time we 
get 

P{B 2 , t ) > P(Ji{t-) > (1 + e)-/t 1/a x)(l - g((l + e^^x, t+ - i_,£ 7 i 1/a z)). (35) 
By Lemma 5.5 and the definition of t_, 

limP(Ji(i_) > (1 +e) 1 t 1 ' a x) = 1 - e -^ a (^-^+^r a . ( 36 ) 

By Lemma 5.1, 

limg((l + e) 7 < 1 / Q .T, t+ - £ 7 t 1 / Q a;) = 0. (37) 

Combining (35), (36) and (37), we get 

limsu P P(J 1 (i?- 1 (t)) < 7 t 1 / Q x) < e -*~ o ( 1 -«0( 1 + e )~ a . (38) 
tj.o 

To estimate P(B^ : t), we first define .B^t to be the event that Ji(i+) < (1 — e)7t 1 /"x. By applying the 
strong Markov property at the stopping time T = inf{s > i_: Ji(s) > r )t x ^ a x\ 1 we see that 

P(B^ t \Bl t ) < g((l - E) 7 f 1/Q i,( + -t-,e 7 £ 1/Q x). (39) 

Therefore, 

P(P 3 . t ) > P(fl4,t n fl 3 ,t) = ^(s 4 , t ) - P(B A , t n s 3 c it ) > P{B i>t ) - P(B 4 . t \Bl t ), 
which by (39) and Lemma 5.1 converges to P{B 4>t ) as i J. 0. Combining this result with Lemma 5.5, we get 
liminf PUUR-^t)) < jt 1/a x) > liminf P(B 4 t ) = c -^ a ^+^-^~ a . ( 40 ) 

no no 

Combining (38) and (40) and letting e j gives (34). □ 
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6. Hausdorff and packing dimensions 



In this section, we prove Theorem 1.7. Because it is known that dim# (S) < dimp(S'), it suffices to show 
that dimp(5') < l/(a — 1) a.s. and that dim^(5) > l/(a — 1) a.s. We begin with the upper bound on the 
packing dimension, which is easier and follows from the same argument given by Evans [20] for Kingman's 
coalescent. 



Proposition 6.1. We have dimp(S') < l/(a — 1) a.s. 



Proof. By (8), it suffices to show that for all (3 > l/(a — 1), we have pp(S) < oo, where pp(S) is as defined 
in (7). Since S itself is a cover of S, it suffices to show that Pp(S) < oo for all (3 > l/(a — 1), where Pp(S) is 
as defined in (6). For this it suffices to show that for all (3 > l/(a — 1), almost surely we can find 5 > and 
K < oo such that if {V^}^ 1 is a 5-packing of S, then 1^1^ < K <oo. 

Let N(t) be the number of blocks of n(t), and let j3 > l/(a — 1). By Theorem 1.1, almost surely there 
exists a 5 > such that for all t £ (0,6), we have N{t) < Cf^ 1 /^" 1 ), where C = 2[a/(AT(2 - a))} 1 ^ ^ . Let 
{Vi}^ be a (5-packing of 5, and let D k = 2-( fc+1 )<5 < \Vi\ < 2- k S}. For all xeS, let r) denote the 
closed ball of radius r centered at x. Note that this ball also has diameter r because (S,d) is an ultramctric 
space. By construction, for all t > 0, the set S is a union of N(t) balls of the form B(x,t), each containing 
the integers in one of the blocks of 77 "(t). If s > f, then any open ball of radius s must have its center in one 
of the N(t) closed balls of radius t, and therefore must contain one of the N(t) balls of radius t. It follows 
that D k < N(2- (k +^S). Thus, 

OO OO oo oo 

Y, <Y. D ^ 2 ~ k5 ^ <Y. N ^~ (k+1)5 ) { ~ 2 ' H ) fi ^ Cr 1 /( a - 1 ' + ^2( fc+1 »/( a - 1 '-^ < K < oo, 

i=l fc=0 k=0 k=0 

where K < oo because (3 > l/(a— 1). □ 



It remains to prove a lower bound on the Hausdorff dimension. Our argument is motivated by the proof 
of Theorem 5.5 in [17]. We will use the following lemma, which is essentially Proposition 4.9a in [21]. 
Proposition 4.9a in [21] is stated for Euclidean space, but the same proof works in general metric spaces. 



Lemma 6.2. Let 7 be a probability measure on S . If 



j(B(x,r)) 

limsup < C for 7 -almost all 16S, 

rj.0 



r /3 



then dimfj(S) > f3. 



To apply Lemma 6.2, it will be necessary to construct a suitable probability measure 7 on S. We will use 
the same approach used by Evans for Kingman's coalescent in Section 5 of [20]. If B(x,t) is a ball in S, then 
B(x,t) contains the integers in one of the N(t) blocks of LT(t), and all of these integers are centers of the 
ball. Consequently, every ball in S can be written as B(n,i), where n G N, and since the coalescent process 
(n(t),t > 0) has jumps only at a countable set of times, only countably many of these balls are distinct. 
Given n € N and t > 0, define j(B(n, t)) to be the asymptotic frequency of the block of 77(f) containing n. 
If s > t , then the block B of 77(s) containing n is a union of finitely many blocks B\, . . . , B k of 77(f), and if 
Hi G Bi for i=l,.,.,k, then B(n, s) — \Jl =1 B(jii,t). Since the asymptotic frequency of B is the sum of the 
asymptotic frequencies of 7?i , . . . , B k , the function 7 can easily be extended to a finitely additive set function 
on the collection B consisting of the finite unions of balls. One can easily check that the complement of a 
finite union of balls in S is also a finite union of balls, so B is an algebra. Every open subset of 5 can be 
written as a union of balls, and therefore as a countable union of balls centered at one of the integers, so 
the cr-algebra generated by B is the Borel er-algcbra. Since S is complete and almost surely can be covered 
by finitely many balls of radius f for any f > 0, we have that S is compact almost surely. As noted in [20], it 
then follows from Theorems 3.1.1 and 3.1.4 in [15] that 7 can be extended to a probability measure on S. 
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Lemma 6.3. Let K(t) be the asymptotic frequency of the block of LT(t) containing 1. Let X be a point of 
S chosen at random with distribution 7. Then the processes (K(t),t>0) and ( r y(B(X,t)),t>0) have the 
same law. 

Proof. Clearly for all positive integers i and n, the indicator of the event that i and n are in the same block 
of LI(t) is measurable. Therefore, j(B(n,t)) is measurable for all positive integers n, as it can be expressed 
as a limit of an average of these indicators. It follows that j(B(X, t)) is also measurable. The processes 
(K(t),t > 0) and (j(B(X,t)),t > 0) are both nondecreasing, so it suffices to show that they have the same 
finite-dimensional distributions. Since K(0) = -f(B(X, 0)) = a.s., it suffices to show that, for all t > 0, the 
processes (K(s), s>t) and (j(B(X, s)),s> t) have the same law. 

Let (n'(t),t > 0) be the restriction of (LT(t),t > 0) to N' = {2, 3, . . .}, meaning that if i, j > 2, then i and j 
are in the same block of LI'(t) if and only if they are in the same block of 77. Let a" be the restriction of the 
metric d to N', and let (S',d') be the completion of (N',g?'). Since the yl-coalescent has proper frequencies, 
the integer 1 is not a singleton in LI(t) for any t. Therefore, if (ifc)^Li i s a sequence of positive numbers 
converging to zero, for each k there is an integer rife such that 1 and rife are in the same block of LI(tk). It 
follows that d(l, rife) < tk for all fc, so rife — > 1 in S. Since (S',d') is complete, it follows that the metric spaces 
(S, d) and (S',d') are isometric, except that the point labeled 1 in S is unlabeled in S' . Thus, we can also 
view 7 as a probability measure on S' . 

Fix t > 0. Let m, . . . , n?q(t) be the smallest integers in the N(t) blocks of LI'(t). Note that if x £ B(nk,t), 
then B(x, s) = B{rik, s) for all s >t. Since X has distribution 7, the probability that X is in B(rik, s), condi- 
tional on (LT(t),t > 0), equals the asymptotic frequency of the block of LT(t) containing rife. Likewise, since 
n(t) is an exchangeable random partition, the probability that 1 is in i?(nfc, s), conditional on (LT(t),t > 0), 
is the asymptotic frequency of the block of LI(t) containing rife. Since, whenever x,y € B(nk,t), we have 
7(5(21, s)) = -f(B(y, s)) for all s > t, it follows that the processes (7(5(1, s)),s > t) and { n f{B{X, s)),s > t) 
have the same law. The lemma follows because K(s) = j(B(l, s)) for all s. □ 

The next proposition, combined with Proposition 6.1, proves Theorem 1.7. 

Proposition 6.4. We have dim/j(5') > l/(a— 1) a.s. 

Proof. Fix j3 < l/(a — 1). We need to show that drnifj(S') > /3, and by Lemmas 6.2 and 6.3, it suffices to 
show that for some constant C, we have 



where K(t) is the asymptotic frequency of the block containing 1 at time t. 

Let & be a Poisson point process on [0,cx)) x {0,1}°° with intensity measure dt x L(d£), where L(B) = 
J Q x (B)x~ 2 A(dx) for all measurable B and Q x is the distribution of an infinite sequence of {0, l}-valued 
random variables that are one with probability x and zero with probability 1 — x. We may assume that the 
coalescent process (LI(t),t > 0) is constructed from <P as described in Section 3. Choose a real number b such 
that (3 <b < l/(a — 1). Obtain a new Poisson point process W* by removing from ^ all points (i,£) such 
that £i = l and 



Then define a new coalescent process (n*(t) 7 t > 0) from if - *, again using the procedure described in Section 
3. Choose 5>0 such that f(x) < 2Ax x ' a for all x 6 (0, 5]. The number of points in [0, t] x {0, 1} 00 that are 
in ^ but not *P* has a Poisson distribution with mean 



lim sup i"* 9 if (i) < C a.s. 

no 



(41) 



n — >oo ft ' J 
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<5- 1 A(\S l])t + — t b(i- a )+i 

Since 6(1 — a) + 1 > 0, this expression goes to zero as t J. 0. It follows that there almost surely exists a t* > 
such that the processes \P and \P* are the same on [0,i*] x {0, 1}°°. Therefore, II (t) = II* (t) for all t < t* , so 
it suffices to show (41) with K(t) replaced by K*(t), where K*(t) is the asymptotic frequency of the block 
containing 1 in II* (t). 

The partition II* (t) is not exchangeable because only points (t, £) with £i = 1 are removed from 
However, the sequence whose fcth term is the indicator of the event that 1 and k + 1 are in the same block 
of II* (t) is exchangeable. Therefore, the asymptotic frequency K*(t) of the block containing 1 exists almost 
surely, and its expected value is the probability that 1 and 2 are in the same block of 77* (t) . This probability 
is bounded by the expected number of points (s, £) of \P* such that s < t and £i = £2 = 1- Therefore, for t 
small enough that t b < 5, 

E[K*(t)]< [ [ A(dx)ds<tA{(0,t b ])<2At [ x x ~ a Ax = -^-t 1+b(2 - a \ 
Jo Jo Jo 2 - a 

Therefore, by Markov's Inequality, for all e > 0, there exists a constant C* such that for all t > 0, we have 
P{t~ p K*{t) >e)< C*F, where 7/ = 1 + 6(2 - a) - /3 > 1 + 6(2 - a) - b = 1 + 6(1 - a) > 0. It follows that 

OO OO 

J2 P(2 kf3 K*(2~ k ) >e)<C*J2 2" fc " < 00. 
fc=i fe=i 

By the Borel-Cantclli lemma, we have 2 k P K* (2~ k ) < e for sufficiently large k almost surely. Since the process 
(K(t),t > 0) is nondecreasing, this implies (41) with K* in place of K. □ 



7. Dynamics of the number of blocks 



In this section, we prove Theorems 1.8 and 1.9. Our first lemma gives a bound on the probabilities Cn,fc 
which is uniform in n, under the additional assumption that the density of A is bounded. 

Lemma 7.1. Assume that the assumptions of Theorem 1.8 hold, and that in addition there are constants 
< C\ < C2 < 00 such that the function f satisfies 

Cix 1 "" < f{x) < C 2 x 1 - a (42) 

for all x G (0, 1]. Then there exists a constant C such that Q rh k < Ck^ 1 ^" for all positive integers n and k 
such that k < n — 1 . 



Proof. For 2 < k < n, we have 



kJ ^<^[ k )J o - k - 1 - a ^--r-^= kl -: a+ ; ) <c^k-^, (43) 

where C3 is a constant that does not depend on n or k. The same argument gives (^)A„.fe > C^n a k~ x ~ a , 
where C4 is another constant, so 

k=2 ^ ' k=2 

for some constant C5. Eqs (43) and (44) give the result. □ 
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Lemma 7.2. Let V n<k be the event that N(t) 6 {n,n + 1, ...,n + k} for some t. Under the assumptions of 
Theorem 1.8, for all e > there exists a positive integer K such that for all positive integers n, we have 
P{V^ n+k ) < £ whenever k > K . 

Proof. We first assume that the density / satisfies (42) for all x G (0, 1]. Recall that V n — V n fi is the event 
that N(t) = n for some t. For all positive integers m, let B m be a random variable such that P(B m = j) = ( m j 
for all j < m — 1 . Note that 

n+k ~ 

P{Vf hn+k n V n+k+1 ) < P(Vl n+k \V n+k+1 ) = P(B n+k+1 >k+l)= Y, Cn+fc+ij < ~(k + I)"", (45) 

j=k+2 

where C is the constant from Lemma 7.1. Note that almost surely the coalescent must have more than n 
blocks for sufficiently small t because the coalescent process restricted to {1, ...,n+ 1} almost surely holds 
in its initial state for a positive amount of time. Therefore, if V£ n+ k occurs, then V£ n+ k+i HK+Hi+i must 
occur for some nonnegativc integer i. By (45), 

p(y° n+k ) < jr p{vi n+k+l n v n+k+i+1 ) < ^ f> + 1 + \)- a < ° fc 1 -". 

i=0 a i=0 ' 

Because this bound does not depend on n, the conclusion of the lemma holds when / satisfies (7.1) for all 

35 6(0,1]. 

Next, consider the general case in which / could be unbounded. The assumptions imply that there exists 
a S > such that / satisfies (42) for x 6 (0,6]. Define a measure Ai on [0,1] by Ai(dx) = (f(x)l{ x <s} + 
•i; 1_Q l{5< x <i}) dec. Let (LTi(t),t > 0) be a vli-coalescent. By Lemma 3.1, we may assume that the coalescent 
processes (n(t),t > 0) and (TIi(t),t > 0) are coupled so that there almost surely exists a random time 
t > such that n(s) = -Oi(s) for all s < t. It follows that there exists a fixed time u > such that P(n(s) = 
n 1 (s) for all s < u) > 1 — e/3. Furthermore, there exists an integer M such that P(N(u) < M) > 1 — e/3. By 
our result for the case in which / satisfies (42) for all x 6 (0, 1], there exists an integer L such that whenever 
k > L, the probability that H\ (t) 6 {n, n + 1, . . . , n + k} for some t is at least 1 — e/3 for all n. However, if 
il(s) = IIi(s) for all s <u and N(u) < k, then V ni „ + k occurs if and only if IZi(i) 6 {n,n + 1, . . . , n + k} for 
some t. The result now follows by taking K = max{L, M}. □ 

Proof of Theorem 1.8. Let (Xi)^l 1 be a sequence of i.i.d. random variables with distribution (. Let Sq = 0, 
and for all positive integers n, let S n = X)"=i ^i- -^ or positive integers k, let M k = maxjn > 0: S n < k}, and 
then define the age by A k = k — SM k ■ The process (^4fc)^ is an irreducible Markov chain on the nonnegativc 
integers, and the distribution of A k converges to a stationary distribution as k —> oo. Since the distribution 
£ has mean l/(a — 1) , the expected time for the Markov chain to return to zero is l/(a — 1) . It follows that 

lim P(A k =0)=a-l. 

k — >oo 

Let e > 0. Choose K as in Lemma 7.2, so that P(V n:n +k) > 1 — £ whenever k> K . Choose an integer m 
sufficiently large that whenever k> (m ~ 1)K, we have \P(A k = 0) — (a — 1)| < e. 

Let W n = V n+ ( m -i)K, n +mK be the event that N(t) G {n + (m — 1)K, . . . , n + mK} for some t. Let H n = 
max {j < n + mK: N(t) = j for some t}. Note that H n > n + (m — 1)K if and only if W n occurs. Let 
(B k ) k L 2 be a sequence of independent random variables such that P(B k = j) = Ck.j for all positive integers 
j < k — 1. For nonnegative integers i such that i < K and positive integers b\, 62, . . . , b m K-i, define the 
function Fi(b\, bi, . . . , b m K-i) as follows. Construct a sequence (a5fc)fc^o~ l such that xq = and, for k > 0, we 
have x k +\ = x k + b Xk if x k < mK — i and x k +i — x k otherwise. Then define Fi(b\, 62, ■ ■ ■ , b m .K-i) to be 1 if 
x j = mK — i for some j, and otherwise. Thinking of B k as the number of blocks lost in the next collision if 
the coalescent has k blocks, we see that, conditional on the event that the coalescent has n + mK — i blocks at 



238 



J. Berestycki, N. Berestycki and J. Schweinsberg 



some time, the probability that the coalcsccnt eventually has exactly n blocks is E[Fi(B n+m K-i, ■ ■ ■ , B n+ {] 
It follows that 



K 

P{V n n W n ) = ]T P(H n =n + mK- i)E[F t {B n+mK _ u B n+1 )}. (46) 
Note also that 

Fi(X 1 ,X-2 7 ■ ■ ■ ,X m K-i) = l{ J 4 mJf _ i =0}- 

By (9), the distribution of B n converges to Q as n — ► oo. Therefore, there is a constant hq such that for 
n > no, the sequences (Pfc)fc=2 an d (^i)ti can be coupled so that the total variation distance between 
{B n+m K-ii • ■ ■ , B n+ i) and {X\, . . . , X m K—i) is at most e for all i < K . It follows that for n > no, we have 

\E[Fi(B n+mK -i,...,B n + 1 )]-P(A m K-i = 0)\<e. 

Therefore by our choice of m, we have \E[Fi{B n+m K-i, ■ ■ ■ , B n+1 )\ — (a — 1)| < 2e. Since P(W n ) > 1 — £ by 
our choice of K, Eq. (46) now yields, for n > no, 

P(V n ) > P(V n n W n ) > P(W„)(a - 1 - 2e) > (1 - e)(a - 1 - 2e) 

and 

P(K) < P(VK n c ) + P(K n W„) < e + (a - 1 + 2e) = a - 1 + 3e. 
The theorem follows by letting e J, 0. □ 

Proof of Theorem 1.9. Recall that Tk = inf{£: iV(t) < fc}. Note that the distribution of L n is the same 
as the conditional distribution of 

Ti 

iV(s)ds 

given N(T„) = n. Suppose g: (0,oo) — > (0,oo) is a nonincreasing function such that <?(i) ~ Ci~ 7 for some 
7 > 1, where ~ means that the ratio of the two sides tends to zero as t J. 0. Then it is straightforward to 
verify that for any D > 0, we have 

D 5 ( s)ds ^__l_ t i-7. (47) 
By Theorem 1.1, we can apply (47) with g(t)=N(t), C= [a /(AT (2 - a))] 1 /^" 1 ), and 7 = l/(a - 1) to get 

Theorem 1.1 also implies that T„ ~ [a/(AT(2 — a))]n, 1_a a.s., where ~ means that the ratio of the two sides 
tends to 1 as n — > 00. Combining this observation with (48), we get 

lim — !— N{s) ds = - a( - a ~p a . s . (49) 

n^n 2 -" J Tn y ' AT(2-a)(2-a) 

Let e > 0. By (49) and Theorem 1.8, there exists an M such that if n > M, then P(V„) = P(N(T n ) = n)> 
(a - l)/2 and 



7V(s)ds 



Ar(2-a)(2-a) 



e(a-l) 
>e < — 



Small-time behavior of beta coalescents 



239 



Since L n is the conditional distribution of J^ 1 N(s)ds given V n , for n > M we have 



a(a — 1) 



e(a-l) 



n 2 - a AT(2-a)(2-a) 

The result follows. □ 
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